Size Bounds for Conjunctive Queries with General Functional Dependencies
نویسندگان
چکیده
This paper extends the work of Gottlob, Lee, and Valiant (PODS 2009) [9], and considers worst-case bounds for the size of the result Q(D) of a conjunctive query Q to a database D given an arbitrary set of functional dependencies. The bounds in [9] are based on a “coloring” of the query variables. In order to extend the previous bounds to the setting of arbitrary functional dependencies, we leverage tools from information theory to formalize the original intuition that each color used represents some possible entropy of that variable, and bound the maximum possible size increase via a linear program that seeks to maximize how much more entropy is in the result of the query than the input. This new view allows us to precisely characterize the entropy structure of worst-case instances for conjunctive queries with simple functional dependencies (keys), providing new insights into the results of [9]. We extend these results to the case of general functional dependencies, providing upper and lower bounds on the worst-case size increase. We identify the fundamental connection between the gap in these bounds and a central open question in information theory. Finally, we show that, while both the upper and lower bounds are given by exponentially large linear programs, one can distinguish in polynomial time whether the result of a query with an arbitrary set of functional dependencies can be any larger than the input database.
منابع مشابه
Entropy Bounds for Conjunctive Queries with Functional Dependencies
We study the problem of finding the worst-case size of the result Q(D) of a fixed conjunctive query Q applied to a database D satisfying given functional dependencies. We provide a characterization of this bound in terms of entropy vectors, and in terms of finite groups. In particular, we show that an upper bound provided by Gottlob, Lee, Valiant and Valiant [9] is tight, and that a corresponde...
متن کاملSensitivity of Counting Queries
In the context of statistical databases, the release of accurate statistical information about the collected data often puts at risk the privacy of the individual contributors. The goal of differential privacy is to maximise the utility of a query while protecting the individual records in the database. A natural way to achieve differential privacy is to add statistical noise to the result of t...
متن کاملDiscovery and Application of Functional Dependencies in Conjunctive Query Mining
We present an algorithm for mining frequent queries in arbitrary relational databases, over which functional dependencies are assumed. Building upon previous results, we restrict to the simple, but appealing subclass of simple conjunctive queries. The proposed algorithm makes use of the functional dependencies of the database to optimise the generation of queries and prune redundant queries. Fu...
متن کاملComputing Supports of Conjunctive Queries on Relational Tables with Functional Dependencies
The problem of mining all frequent queries on a relational table is a problem known to be intractable even for conjunctive queries. In this article, we restrict our attention to conjunctive projection-selection queries and we assume that the table to be mined satisfies a set of functional dependencies. Under these assumptions, we define and characterize two pre-orderings with respect to which t...
متن کاملComparing and Mining Conjunctive Queries from a Relational Table with Functional Dependencies
In this paper we study the problem of mining all frequent queries in a relational table, a problem known to be intractable even for conjunctive queries. We restrict our attention to projectionselection queries and we assume that the table to be mined satisfies a set of functional dependencies. Under these assumptions, we define two pre-orderings with respect to which the support measure is show...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/0909.2030 شماره
صفحات -
تاریخ انتشار 2009